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1. INTRODUCTION 
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ABSTRACT 
The purpose of this study is to investigate the relationship between 
the phonetic content of prose texts in English and the emotion that the 
texts inspire, namely - the effect of vowel-consonant bi-phones on subjects’ 
evaluation of positive or negative emotional valence when reading. The 
methodology is based on data from an experiment where the participants, 
native speakers of three different languages, evaluated the valence invoked in 
them by one-page texts from English books. The sub-lexical level of the texts 
was obtained using phonetic transcriptions of the words and their further 
decomposition into vowel-consonant bi-phones. The statistical investigation 
relies on density-measures of the investigated bi-phones over each text 
as a whole. The result shows that there exists a correlation between the 
obtained sub-lexical representation and the valence perceived by the readers. 
Concerning the type of the consonants in the bi-phones (abrupt or sonorant), 
the influence of the abrupt bi-phones is stronger. However, sub-sets of both 
types of bi-phones showed relatedness with the emotional valence conveyed 
by the texts. In conclusion, the speech, expressed in written form, is laden 
with emotional valence even when the words’ lexicological meaning is not 
taken into consideration and the words are apprehended as mere phonetic 
constructs. This prompts hypothesizing that words semantics itself is partly 
underpinned by some mental emotion-related level of conceptualization, 
influenced by sounds. For practical purposes, the result suggests that based 
on the syllabic content of a text it should be possible to predict the valence 
that the text would inspire in its readers. 
© 2019 IJCRSEE. All rights reserved. 


trariness proposed in the theory of linguistic 
signs (De Saussure, 1916/1983). 


Several studies have shown that speak- 
ers from different cultures detect similar emo- 
tional content solely with reference to speech 
flow even when the speech consists of pseu- 
do-words (e.g. Scherer, 2000; Scherer et al., 
2001). In the present study, focusing on the 
relationship between emotions, language, and 
speech, I have assumed from such evidence 
that several aspects of the speech, including its 
phonetic content, display some basic, species- 
typical pattern that involves features related 
to emotion transfer. The line of reasoning fol- 
lowed herein questions the principle of arbi- 
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Defining the essential attributes, func- 
tions and substrates of emotion has shown it- 
self to be a particularly elusive problem, and 
overall consensus thereto remains lacking 
(Kleinginna and Kleinginna, 1981; Russell, 
2003; Hamann, 2012 for overviews). There 
exist two main models of emotions used in 
machine emotion recognition — a discrete 
model of emotion categories and a continuous 
model of emotion dimensions (for a recent 
overview see Burton, 2015). The set of Emo- 
tion categories (such as anger, disgust, fear, 
happiness, sadness, and surprise), as proposed 
in the works of Ekman, P. (1999) and further 
enlarged, has become accepted by the greater 
scientific community and as a foundation for 
the development of contemporary emotion 
recognition systems. The other model of emo- 
tion portrays emotional phenomena accord- 
ing to their discernible attributes, considered 
as emotion-dimensions. Following this model 
(suggested by Russell and Mehrabian, 1977) 
emotions are usually described with three di- 
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mensions - valence, arousal and dominance 
(VAD). Valence indicates whether emotions 
are pleasant (positive) or unpleasant (nega- 
tive), arousal - the degree to which they are 
exciting or calming and dominance a rating 
of one’s own status in relation to an emotion- 
causing occurrence. During the past few years, 
the VAD model has become intensively used 
in systems for emotion recognition as well as 
in the domain of sentiment analysis, prompt- 
ing various efforts to bind dimensional mod- 
els to emotional categories (e.g. Buechel and 
Hahn, 2017). However, the theoretical aspects 
regarding the brain mechanisms underlying 
the described formal models and the relation 
between them stay controversial. 

In recent years, brain-imaging tech- 
niques have allowed the identifying of clear 
neural signatures that correspond to basic 
emotions. These patterns are involved deeply 
in the brain’s typical multimodal fashion, ex- 
hibiting specific activations within a distrib- 
uted network of cortical and subcortical ar- 
eas (e.g., Saarimaki et al., 2015). In addition, 
questions regarding emotion dimensions and 
their correlates in terms of brain-functioning 
have been investigated. For example, Mad- 
dock at al., (2003) report {MRI evidence from 
a valence decision task showing the existence 
of specific brain regions activated only by 
unpleasant words and regions activated only 
by pleasant words. A recent study of brain- 
impaired subjects showed that both emotional 
valence and basic emotions are related to se- 
mantic memory, including for stimuli based 
on speech prosody (Macorr et al., 2019). Such 
results suggest that emotional valence is re- 
lated to some semantic level of functioning of 
the human brain. 

A matter of direct relevance to this study 
concerns emotional processing in relation to 
word recognition in reading. On the com- 
bined evidence of fMRI (Marinkovic et al. 
2003) and event-related potential time-course 
studies (cf. Abbassi, Kahlaoui et al. 2011, for 
a comprehensive review) it is known that, fol- 
lowing initial recognition of visual features 
of words in the visual cortex, the resulting 
orthographic information is transmitted to au- 
ditory cortex and attributed its corresponding 
phonetic content. (Van Orden, Johnston and 
Hale, 1988). Recognition of lexical meaning 
proceeds largely in accord with processing of 
presented spoken word-sequences, that is — as 
perceived speech sounds. 

There is mounting evidence that there 
exist non-arbitrary sound-symbolic patterns 
in language. One of the best-known examples 


of systematic sound-symbolism and the first 
to be described (Kohler, 1929) is the corre- 
spondence of words predominantly consisting 
of abrupt consonants (e.g. t, k) with angular- 
shaped objects and words featuring sonorant 
consonants (e.g., m, 1) with curvaceous ob- 
jects. Subsequent research related to emotions 
(Fantz and Miranda, 1975; Leder, Tinio and 
Bar, 2011) has demonstrated an inborn human 
emotional preference for curvaceous shapes. 

During the last decade a huge amount of 
studies in sound-symbolism concentrate on the 
words’ iconicity, a phenomenon considered as 
related mainly with the association between 
the sounds and the words’ lexical meaning 
(e.g., Imai and Kita, 2014; Perlman, Dale and 
Lupyan, 2015, Edmiston et al., 2018, Winter 
et al., 2017; Jones and Vigliocco, 2017, Sidhu 
and Pexman, 2018 and many other works). 
The results of these numerous investigations 
confirm the relations between sound and 
meaning and propose that the phonetic content 
of the words could arise based on mechanisms 
related to perception, sensation, repeated 1mi- 
tation and so forth. These results suggest that 
sound-symbolic mechanisms have a general 
species-specific character. Indeed, in their re- 
cent experimental study, D’ Anselmo and col- 
leagues (D’ Anselmo et al., 2019) did not find 
significant differences between Italian and 
Polish participants in guessing successfully 
the correct meaning of words in unknown lan- 
guages. The authors conclude that there exist 
sound symbolic patterns that are independent 
of the mother tongue of the listener. 

The connection between emotions and 
sound symbolic features also started to gain 
attention and led to results showing, for ex- 
ample, that taste and smell words form an af- 
fectively loaded part of the English lexicon 
(Winter, 2016). 

The study of phonological effects on 
emotion has centered predominantly on po- 
etry. The reasoning related to poetry origi- 
nates from the Russian Formalists where the 
sound in poetry was first examined in a sys- 
tematic fashion (for descriptions see Trotsky, 
1957; Jakobson, 1960; Shklovsky, 1990; Man- 
delker, 1983). Subsequent research, focusing 
on phoneme-level effects on experienced emo- 
tional state, has examined diverse examples of 
English poetry ranging from Byron to Beatles’ 
song lyrics (see Whissell, 1999). Recent emo- 
tion-focused studies of German poetry under- 
taken by Aryani, Ullrich, and colleagues (Ary- 
ani et al., 2013; Aryani et al., 2016; Ullrich et 
al, 2017) have also demonstrated the existence 
of a relation between the phonetic content of 
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poetic texts and the emotion that they convey. 

The investigations of more general emo- 
tion-effects implicated in sound-symbolism 
have revealed important findings. Kawahara 
and Shinohara (2012), for example, conduct- 
ed experiments, demonstrating a tripartite 
trans-modal symbolic relationship between 
three domains of cognition (auditory sounds, 
visual shapes, and emotions). A recent study 
by Adelman and colleagues (Adelman et al., 
2018), based on data-analysis of corpora rep- 
resenting five languages of Europe, revealed 
the existence of strong correlations between, 
for example, abrupt initial-position phonemes 
and highly negative, arousing words as well 
as between slow onset initial phonemes and 
positive words. These recent studies concen- 
trate on the general effect of single phonemes. 

The idea that syllables represent basic 
compositional elements of speech is not new 
in linguistics (see, e.g., Ito, 2018). The ra- 
tionale of consonant-vowel speech composi- 
tional sequences is incorporated in the written 
form of several languages, for example, Ara- 
bic. A syllabic structure is detectable, too, in 
sign-languages. For example, a recent study 
by Gokgo6z (2018) revealed that Turkish sign- 
language has a syllabic composition. The 
syllables-emotion relation is acknowledged 
and used in speech-emotion recognition (e.g. 
Origlia et al., 2014). The role of syllables as 
emotional indicators was investigated, too, in 
terms of micro-prosody conveyed by syllabic 
pitch-profiles (Brandt and Bennett, 2015). 

The study proposed here investigates 
the relation between the syllabic content of 
prose texts in the English language and the 
conveyed emotional valence with the hypoth- 
esis that such a relation exists. 


2. MATERIALS AND METHODS 
2.1. Goal and approach 


The goal of the presented work is to 
explore statistically the relation between the 
vowel-consonant syllabic content of prose 
texts and the emotional valence that the text 
inspires in the readers 1n order to provide some 
phonological and statistical details of such re- 
lation, if 1t exists. 

The used approach examines only the 
phonetic content of the word-sequences, 
without regard to their lexicological mean- 
ing, speech prosody, and other features of lan- 
guage that are commonly considered related 
to communicating emotion. 

The study proposed here was inspired 


by the analysis of the tripartite trans-modal 
symbolic relationship by Kawahara and Shi- 
nohara (2012). Their study was based on pseu- 
do-words composed by two consonant-vowel 
syllables, organized 1n two types of phonolog- 
ical stimuli: “abrupt” (“Stop condition” - e.g. 
[tadi]) and “sonorant” (“Sonorant condition” - 
e.g. [maji]). The obtained result, based on au- 
ditory input of such pseudo-words, confirmed 
that English speakers associate oral stop con- 
sonants with angular shapes and showed that 
oral stops are associated with emotion types 
that involve abrupt onsets (e.g., “shocked” and 
surprised”) and, too, that angular shapes are 
associated with those types of emotions that 
involve abrupt onsets. The study has led to the 
same kind of conclusions concerning the so- 
norant condition. 

In the approach proposed in the present 
study, I supposed that features as “oral stop” 
and “oral passage” are clearly detectable 1n bi- 
phone syllables of the type vowel-consonant. 
The statistical treatment of the experimental 
data performed here is based solely on this 
syllabic scheme. Two forms of vowel-conso- 
nant biphones - vowel-abrupt (e.g., /ot/) and 
vowel-sonorant (e.g., /um/) were deployed. 
On the presumption that the emotional dimen- 
sion of valence has a neuropsychological ba- 
sis, | undertook an exploration of the relation 
between the English language, represented at 
vowel-consonant sub-lexical level, and emo- 
tional valence. 


2.2. Experiment and data 


In testing the hypothesis, the aim was to 
extract from textual data some metadata that 
is necessary for statistically investigating the 
presence of a relatedness of the phonetic char- 
acteristics of texts and the valence inspired in 
the readers. This was done based on experi- 
mental data, where the experiment was de- 
signed for this specific purpose (Figure 1). 
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Figure 1. General scheme of the study 


The experiment was organized and con- 
ducted within a student’s project at the New 
Bulgarian University. Twenty approximately 
one-page-long passages of text were selected, 
each evoking a certain scenario that would 
predictably inspire a given emotional valence. 
The passages (hereafter called experimental 
texts) were taken from works by Leigh Bar- 
dugo, Charles Dickens, Neil Gaiman, Robin 
Hobb, Derek Landy, Brandon Sanderson, and 
J. R.R. Tolkien. 

The participants in our experiment were 
residents of different countries, were native 
speakers of different languages (Arabic=2, 
Bulgarian=10, English=5), and all were high- 
ly proficient English language users. Their 
task was to evaluate each text overall and not 
its parts. 

We submitted four texts (that we judged 
as different with regards of valence) to each 
participant to read silently and evaluate the 
valence of each. All participants received in 
written form the following instruction: 

“Many texts invoke certain emotions in 
the reader. For example, “A rose by any other 
name would smell as sweet” by Shakespeare 
invokes joy, positive emotion, while ‘‘As the 
light begins to intensify, so does my misery, 
and I wonder how it is possible to hurt so much 
when nothing is wrong” by Tabitha Suzuma — 
sadness, negative emotion. You will receive 
four texts in English. They are excerpts (a 
page long) from famous literary works, writ- 
ten by English natives. Evaluate what emotion 
is invoked in you by each of these texts by us- 
ing this scale: 

Very negative, Negative, Does not 
arouse emotion, Positive, Very positive “ 

The received written evaluations were 
assigned numeric marks from -2 (very nega- 
tive) to +2 (very positive). The 20 texts were 
evaluated by the 17 participants, where each 
text was evaluated by at least 3 subjects (Max 
6, Mean 4.7). 


2.3. Phonetic representation and 
sub-lexical metadata 


The textual data was stored in a database 
(see Figure 3) and further treated in order to 
extract its phonological characteristics as a 
stream of syllables. To perform the extraction 
of syllabic metadata from the raw text, 6 main 
steps were accomplished, as illustrated in Fig- 
ure 2. 

The sentences were first decomposed 
into word-forms. This step identified 10,692 
word occurrences in the experimental texts. 
Next, the words’ phonetic transcriptions (fol- 
lowing the commonly adopted IPA standard 
of English phonetic transcription) were down- 
loaded from the Internet using online diction- 
aries. A number of words (about 2000) have 
necessitated a manual search. The dictionary 
of the transcribed words obtained after this 
step contains 5,767 phonetically transcribed 
words. The vocabulary used in the experi- 
mental texts contains 2,505 different English 
word-forms where 1,989 (79% of the used 
vocabulary) were represented with their tran- 
scription. 
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Figure 2. Stages of the treatment 


Next, all word-occurrences which are 
proper nouns were retrieved and marked so as 
to not be included in the further analysis (see 
the sentence-example in Table 1). This step 
was necessary in order to exclude them from 
the emotion-related statistical picture, because 
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such words are not necessarily chosen by the 
writer and hence do not reflect her emotional 
state or intent. After this protective step, 153 
occurrences of proper nouns were excluded. 
In total, 95% of the word occurrences in the 
experimental texts were subjected to further 
phonetic-decomposition and statistical treat- 
ment. 

The next step was the syllabic decom- 
position of the transcripts. The approach pro- 
posed here is based on biphone syllables based 
on the entire set of phonemes in the English 
language. We took the set V of vowels (/e/, /n/, 
/A/, /1:/, etc.) and the set C of consonants (/n/, 
/d/, /k/, /{/, etc.) and composed the Cartesian 
product VxC containing all possible biphone 
syllables of one vowel (v © V) as first element 
and one consonant (c € C) as a second ele- 
ment. Further, the biphone syllables obtained 
this way are called here “VC-biphones”’. 

The obtained Cartesian product, con- 
taining all 528 possible combinations of vow- 
els and consonants, was stored in a separate 
table (Table “Syllables” in Figure 3) and used 
to decompose the phonetic transcription of 
words into VC-biphones when parsing. Pars- 
ing of the transcribed words showed that 267 
out of the 528 VC-biphones are used in the 
experimental texts overall. In this way, the 
words’ phonetic transcriptions were present- 
ed as a VC-biphone sequence for each word 
and, consequently, a VC-biphone sequence for 
each text (see the example in Table 1). 

Further in this paper, I call a syllabic 
flow the phonological image of the experi- 
mental texts represented by means of VC-b1- 
phones, as shown in the example provided in 
Table 1. For the phonological analysis, with 
regards of the results reported by Kawahara 
and Shinohara (2012), the VC-biphones were 
divided into “abrupt” subset (/a:6/, /pd3/, /eeb/, 
/avtf/, etc.) and “sonorant” subset (/a:m/, /pn/, 
/o:1/, /a0l/, /aw/, etc.) following their conso- 
nant (voiced or unvoiced). 


Table 1. Example of decomposition of 
the sentence “Sarene stepped off of the ship 
to discover that she was a widow.” into VC- 
biphones — a query result. 


Word — Phonetic Syllabic  se- 

flow . transcript flow quence 
tion 

Sarene 1 (@Sarene 

stepped 2 stept ep 1 

off 3 pf nf 1 

of 4 av av 1 

the 5 ‘da 

ship 6 ftp yp 1 

to Tt tu: 

discover & diskaver ¥s 1 

discover 8&8 diskavar AV 2 

discover §& diskaver ar 3 

that 9 dzet get 1 

she 10 ‘fi: 

was 11 WDZ DZ 1 

a I? ‘el 

widow widow yd 1 


It should be noted that the particularities 
of standard notation used 1n the transcripts led 
to naming the phoneme /I/ (as per e.g. /1t/ in 
[ benrfit]) to /y/ because the upper case of the 
letter 1 used to denote the phoneme 1s undis- 
tinguishable from its lower case (as per, e.g., / 
it/ in [rit]) by the used data-treatment product. 

As the statistical treatment was based 
on results of counting queries, it was impor- 
tant to protect the calculations from erroneous 
data-fusion, and the data was organized in a 
relational database in 3" Normal Form with 
referential integrity (Figure 3). This means of 
structuring the data allowed performing the 
obligatory step of data-verification based on a 
comprehensible text-decomposition as shown 
in Table 1, and thereupon, to perform count- 
ingqueries. 


www.ijcrsee.com 


45 


Slavova, V. (2019). Towards emotion recognition in texts — a sound-symbolic experiment, /nternational Journal 
of Cognitive Research in Science, Engineering and Education (IJCRSEE), 7(2), 41-51 













| Text Evaluated 


Participants et 
¥ Pericipal Es a || ¥ TextiD 
= a- ¥ Participant Ev 
Evaluation Texts 





















NotesEvakuati... | 
¥ Note _ : 


Valen * | 


V TextID 
the text |= 
Text from 










Text Contains ... 


¥ TextID 
¥ word 
¥ position 









Words Transcri... 
! WordFor # 
transcrip) * 


re 
YF Syllable key « 












Contains Syllables 
| ¥ TrancrWort « 
¥ syllable Key = 


# sequence ~ 





Sounds like ¥ 








Figure 3. Data Base - organization of 
the data for the treatment 


The analysis showed that the 20 ex- 
perimental texts were containing on average 
534 words (min. 207, max. 1016). The overall 
number of VC-biphones detected in the exper- 
imental texts is 9,275 where 4,624 are abrupt 
and nearly the same number — 4,651, are so- 
norant. The phonetic density of the examined 
feature seen as percentage of VC-biphones 
over the total number of words is quite low — 
90%, that is — on average, one word contains 
less than one VC-biphone. 


3. RESULTS 
3.1. Correlation between emotional 
valence and the volume of abrupt 
syllables 


To evaluate the content of each experi- 
mental text in terms of participation of abrupt 
and sonorant VC-biphones in the syllabic flow, 
the following measure for the Texts’ Syllabic 
Charge (TSC) was applied: 


| _ NSyllep j 
(1) USC srj NWord,. 

where the index ST denotes the type of 
the VC-biphones — abrupt or sonorant, NSyl- 
STj shows the number of VC-biphones of 
type ST that appear in the experimental text J, 
NWordj is the number of transcribed words in 
the experimental text j (j = 1 to 20). The texts 
syllabic charge shows the extent of involve- 
ment of abrupt or of sonorant VC-biphones in 
the syllabic flow in a given experimental text. 

This measure revealed that the used tex- 
tual data cannot provide a reliable statistical 
picture for the texts’ charge regarding the so- 


norant VC-biphones and their correlation with 
valence (Pearson’s r = - 0.06, p <0.8). 

The text syllabic charge ensuing from 
abrupt VC-biphones led to much more relia- 
ble statistics, indicating relatedness between 
the volume of abrupt VC-biphones and Va- 
lence (r= 0.416, p < 0.068). This indicates that 
at least some subset of abrupt VC-biphones 
has influenced the readers’ emotional judg- 
ment. The plot in Figure 4 illustrates this ten- 
dency. 


Texts’ syllabie charge 


, abrupt 


s0norant 





7 & 9 10 11 12 13 14°15 16 17 18 19 20 21 


Figure 4. Texts’ Valence and Syllabic 
Charge of abrupt and sonorant VC-biphones 


The displayed result is not statistically 
unquestionable, but nevertheless indicates a 
particular tendency. This fact prompted two 
hypotheses: 1. The experimental texts are too 
short to contain a sufficient volume of VC-bi- 
phones; 2. The split of the VC-biphones into 
abrupt and sonorant sub-sets does not lead to 
a clear-cut statistical picture. The next step 
was to investigate the influence of each of the 
syllables separately, independently of their 
abruptness. 


3.2. Connection between emotional 
valence and the set of VC-biphones 


To evaluate the degree of involvement of 
each of the VC-biphones in the syllabic flow 
pertaining to each of the experimental texts, a 
Syllabic Ratio per Document (RatSy//) of each 
of the 267 biphones appearing in the texts was 
calculated using the equation: 


2) Ra yi I Word: NSyily 


where NSylij shows how many times 
the VC-biphone 7 (i=1 to 267) appears in the 
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experimental text 7, NWordj 1s the number 
of transcribed words in the experimental text 
J G = 1 to 20) and NSyllj is the number of 
VC-biphones in the syllabic flow of the ex- 
perimental text j. As the Syllabic Ratios ob- 
tain very small values, for reasons related to 
the perceptibility of the metadata, they were 
multiplied by k=10°. The Syllabic Ratio per 
Document shows the extent of involvement of 
a given VC-biphone in the syllabic flow of a 
particular experimental text. 

The correlation analysis of the inter- 
dependency between the Valence-score and 
the Ratios of the VC-biphones showed that 
the investigated textual data provides a reli- 
able statistical result (p-value < 0.09) for the 
13 VC-biphones listed in Table 2. As it can be 
observed, only two of these VC-biphones are 
from the sonorant subset. Some of the inves- 
tigated VC-biphones display a high and re- 
liable correlation between them, because, in 
general, the words in the language have sev- 
eral compositional rules to respect. 


Table 2. Pronounced Correlation of sep- 
arate VC-biphones with the Valence score 


correlation 
VC- 
bishoie whith p-value type 
Valence 
ut 0.615 0.0039 abrupt 
ys 0.552 0.0116 abrupt 
eV 0.489 0.0286 abrupt 
yk 0.487 0.0293 abrupt 
as 0.487 0.0295 abrupt 
et 0.464 0.0392 abrupt 
ydz 0.422 0.0636 abrupt 
eIp 0.414 0.0693 abrupt 
ut 0.405 0.0768 abrupt 
3.1 0.389 0.0900 Sonorant 
wl -0.413 0.0702 Sonorant 
1:8 -0.473 0.0353 abrupt 
els -0.500 0.0248 abrupt 


As, due to the nature of the language 
system itself, the phonemes in the speech are 
correlated between them, in order to detect a 
comprehensive statistical picture, a dimen- 
sion reduction using principal component 
analysis (PCA) was performed. 

The 20 experimental texts were pre- 
sented in a 267-dimensional statistical space 
in which each document is depicted by the 
Syllabic Ratios derived from its correspond- 
ing syllabic flow. The initial 267-dimensional 


space was reduced (having eigenvalues > 1, 
where 99% of the variance was extracted) to 
18 principal components (PCs). 

This reduced space expresses features 
which are discriminative for the 20 experi- 
mental texts and are derived from their syl- 
labic flows. It should be noted that, due to the 
rather limited amount of textual data, some of 
the VC-biphones have been excluded from the 
analysis as they occur only a few times 1n the 
textual data and/or occur with zero variance in 
the set of experimental documents. The num- 
ber VC-biphones which syllabic ratios were 
submitted to PCA is 174. The coordinates of 
the experimental texts were recalculated in the 
obtained 18-dimensional space, called hereaf- 
ter a PC Syllabic space. Figure 5 shows the 
experimental texts presented on the first 3 PCs. 





Figure 5. The 20 experimental texts, 
presented in the PC Syllabic space (the first 
3 PCs). 


The relationship between emotional 
valence and the phonetic features expressed 
by the PC Syllabic space were investigated 
in terms of correlation between the valence- 
scores of the 20 texts and the 20 coordinates 
of these texts on the 18 axes (the PCs) of the 
PC Syllabic space. 

The correlation analysis revealed that 
one of the PCs displays a reliable and im- 
portant correlation with valence. (PC#10: 
Pearson’s r = 0.607, p < 0.005). The plot of 
the linear regression of this PC on Valence is 
shown in Figure 6. No other PCs displayed 
a reliable and important correlation with va- 
lence. 

This result suggests that there exist a re- 
lationship between the VC-biphone content of 
a text and the emotional valence that the text 
inspires 1n the reader. 

Next, a check was conducted to ascer- 
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tain whether the abrupt VC-biphones had 
some more important impact on the obtained 
valence-related PC (seen as expressing text- 
discriminative biphonic feature) as such an 
impact was suggested by the correlation of the 
texts syllabic charge reported in section 3.1. 


PC10 


® Observed 
— Linéar 





ED AD h 10 
Valence - Subjects’ Evaluation 


Figure 6. Interdependency between the 
valence-scores of the texts and their phonetic 
metadata. 


3.3. Impact of the abrupt and 
sonorant VC-biphones 


The further step was to investigate sepa- 
rately the impact of abrupt and sonorant VC- 
biphones on valence using the coefficients in 
the PCA transition matrix. The assumption 
was that the PC which shows a high correla- 
tion with the valence-score expresses some 
“summarized” and independent feature pres- 
ent in the syllabic flow which is important for 
inspiring valence. 

The total number of VC-biphones that 
are projected on the valence-correlated PC 
is 174 where 118 are a subset of the abrupt 
VC-biphones and only 56 — a subset of the so- 
norant VC-biphones. The plot of the values of 
their corresponding transition coefficients is 
given in Figure 7. As it is seen, both types of 
VC-biphones are projected with both - posi- 
tive and negative transition coefficients. It 
is also seen that the subset of abrupt VC-bi- 
phones has a stronger effect and that their ef- 
fect 1s mostly in positive direction, confirming 
the result reported in section 3.1. The subset 
of sonorant VC-biphones has a smaller effect 
on valence, but, scientifically, this effect can- 
not be dismissed. The list of the first 10 “more 
negatively” and “more positively” projected 
VC-biphones of both types is provided in Ta- 
bie 3, 


Bcoe!. to PCA 
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Figure 7. Plot of the influence of the 
VC-biphones on the valence-correlated PC, 
expressed by means of their coefficients in the 
component transition matrix. 


It can be seen in Table 3 that the VC-bi- 
phones /erp/ and /yk/ (/Ik/), positively project- 
ed on the valence-correlated PC, are between 
the listed in Table 2 VC-biphones which, even 
in the small amount of textual data used in the 
experiment, displayed, each of them separate- 
ly, a reliable positive correlation with valence. 

As indicated by the extracted metadata, 
the vowels included in the VC-biphones do not 
represent a factor which seems related with 
the inspired valence. For example, as seen in 
Table 3, the vowel /e1/ appears in the abrupt 
/erp/and in the sonorant /erm/ VC-biphones, 
which have a positive influence and, the same 
vowel /e1/ appears in abrupt /e1d3/ and in the 
sonorant /erl/, wich have a negative influence. 

The same type of contradictory inclu- 
sion of vowels can be observed in Table 2 for 
the vowel /u:/ which is included in VC-bi- 
phones displaying both - negative and positive 
correlation with valence. Such observations 
suggest that an eventual detailed analysis has 
to take into consideration more specific pho- 
nological features of the vowels. 
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Table 3. The VC-biphones assigned 
with the ten top positive and negative coeffi- 
cients by PCA. 


Abrupt coef. Sonorant coef. 
eIp 0.709 an 0.467 
pb 0.660 9:1 0.390 
i:d 0.535 3:n 0.347 
ad 0.492 om 0.345 
pt 0.464 aun 0.343 
yk 0.458 ail 0.267 
SUZ 0.434 or 0.249 
aut 0.408 un 0.213 
ab 0.397 elm 0.200 
ud 0.392 yo 0.193 
e1d3 -0).364 AQ -0,.305 
ef -0.333 aur -0.297 
ap “0.291 Uar -0.272 
als 0.256 Dy -0.196 
yvV -0.251 pm -0.194 
3:d -0.249 en -0.163 
Ht, -0.237 ear -0.159 
ep -0.236 yl -0.133 
yp -0.219 ul -0.126 
1:0 -0.213 ell -0.120 
4. DISCUSSION 


4.1. Next steps to be performed 


The analysis proposed here is based on a 
subset of syllables which density in the texts is 
quite low. The step to be performed using the 
same method is to include in the syllabic flow 
the consonant-vowel biphones and to inves- 
tigate a greater amount of valence-evaluated 
textual data. 

The selection of valence-relevant texts 
represents a concern because they have to be 
long enough in order to contain a representa- 
tive subset of syllables and, at the same time, 
the content of each must be such as to evoke, 
in an overall homogenous way, a similar rat- 
ing of valence in readers. This problem can 
be solved using existing corpora of emotion- 
ally evaluated texts combined with convenient 
strategies for assembling valence-homoge- 
neous data-sources with appropriate volume. 


4.2. General discussion 


From the result presented here, it is not 
possible to explain the principle by which the 
observed phonetically accomplished transmit- 
ting of emotion-encoding information enters 
and exerts its emotional effect on a text’s lexi- 
cal substance. To my knowledge, there 1s no 
ready-to-hand explanation of the phenomenon 
described here. The key questions awaiting 
scientific explanation are: 1. how did words 
come to incorporate these sound-patterns and 
2. to what extent the observed patterns are 
language-dependent? The parts of such puzzle 
are far from being assembled. Comparative 
studies of more languages could identify some 
general language-independent features. Key 
to the puzzle is to understand the manner in 
which humans construct their semantic repre- 
sentations. 

The result of this study prompts the hy- 
pothesis that concept’s semantics itself 1s part- 
ly underpinned by some emotion-related level 
of representing phenomena, rooted in the long 
evolutionary development of animal species, 
while phenomenal iconicity and sound sym- 
bolism are instrumental for its overt, verbally 
framed expression as human speech. 


5S. CONCLUSION 


The statistical result shows that there 
exists emotion-related information incorpo- 
rated in the VC-biphone content of the Eng- 
lish language. Thus, the speech is laden with 
emotional meaning even when the words’ 
lexicological meaning is not taken into con- 
sideration and the words are apprehended as 
mere phonetic constructs. The study’s general 
conclusion is that the syllabic composition of 
the words is not arbitrary from the standpoint 
of valence. 

The result suggests that phonological 
characteristics prevailing within a syllabic 
flow should make it possible to predict the 
valence that a given text would inspire in its 
readers. In other words, the features of the syl- 
labic flow could be useful for valence- clas- 
sification of texts. 

However, the statistical parameters of 
the proposed analysis suggest that the con- 
crete details of the revealed interdependency 
can be correctly assessed using experimental 
data of larger volume. 
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